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ABSTRACT 

We present the results of a search for potential transit signals in the first 
three years of photometry data acquired by the Kepler Mission. The targets of 
the search include 112,321 targets which were observed over the full interval and 
an additional 79,992 targets which were observed for a subset of the full interval. 
From this set of targets we find a total of 11,087 targets which contain at least one 
signal which meets the Kepler detection criteria: those criteria are periodicity of 
the signal, an acceptable signal-to-noise ratio, and three tests which reject false 
positives. Each target containing at least one detected signal is then searched 
repeatedly for additional signals, which represent multi-planet systems of tran- 
siting planets. When targets with multiple detections are considered, a total of 
18,406 potential transiting planet signals are found in the Kepler Mission dataset. 
The detected signals are dominated by events with relatively low signal-to-noise 
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ratios and by events with relatively short periods. The distribution of estimated 
transit depths appears to peak in the range between 20 and 30 parts per million, 
with a few detections down to fewer than 10 parts per million. The detections 
exhibit signal-to- noise ratios from 7.1 a, which is the lower cut-off for detections, 
to over 10,000 a, and periods ranging from 0.5 days, which is the shortest period 
searched, to 525 days, which is the upper limit of achievable periods given the 
length of the data set and the requirement that all detections include at least 
3 transits. The detected signals are compared to a set of known transit events 
in the Kepler field of view, many of which were identified by alternative meth- 
ods; the comparison shows that the current search recovery rate for targets with 
known transit events is 98.3%. 

Subject headings: planetary systems - planets and satellites: detection 



1. Introduction 



We have previously reporte d (Tenenbaum et al.lboij ) on the results of searching the 
first 218 days of Kepler Mission ( Borucki et al. 201ol ) data for potential signals indicative of 
transiting planets. In the intervening time, there have been two developments in the search 
for potential exoplanets in the Kepler dataset. First, the algorithms used in the Kepler 
analysis pipeline have undergone dramatic improvements. Second, the data available for 
searching has expanded from 218 days to 1050.5 days. This massive increase in data volume 
makes possible searches for exoplanets with much longer orbital periods, as well as searches 
for extremely small exoplanets with relatively short period orbits. In this study we report 
on the results of searching the current set of Kepler observations with the upgraded analysis 



pipel ine. This study can be considered as an update of the previous report ( iTenenbaum et al. 
20121 ). 



1.1. Kepler Science Data 



The operational parameters of the Kepler Mission have been extensively reported ( IHaas et al. 



2010|). In brief: the Kepler spacecraft is in an Earth-trailing heliocentric orbit of 372 day 



period. Its single instrument, the Kepler photometer, points almost constantly at a 115 
square degree region of the sky centered on a = 19 h 22 m 40 s , 5 = +44.5°. During science 
operations, photometric data is taken in 29.4 minute integrations, known within Kepler as 
"long cadences" (as distinguished from "short cadences," which are 1/30 of a "long cadence" 
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and are collected for a small subset of targets). In order to maintain the correct orientation of 
the solar panels and thermal radiator, the spacecraft rotates about the photometer boresight 
axis by 90° approximately every 93 days, the interval at a given orientation being referred 
to as a "quarter." A consequence of this rotation is that each target star is observed each 
year on 4 different readout channels on the focal plane. Science acquisition is interrupted 
for monthly downlinking of pixel data, maneuvering from one quarter's attitude to the next, 
reaction wheel desaturation (one 29.4 minute sample is lost for this purpose approximately 
every 3 days), and a variety of spacecraft anomalies. 

The data acquisition period for this analysis begins at 2009 May 12 00:00:00 UTC, 
ends at 2012 March 28 12:47:26 UTC, and contains 51,412 sample intervals of 29.4 minutes. 
Of these, 47,588 intervals are dedicated to science data acquisition, the balance of 3,824 
intervals being consumed by the interruptions listed above. During this period the spacecraft 
performed 11 axial rotations, resulting in 12 quarters worth of data. 

A total of 192,313 targets were observed by Kepler during the 12 quarters of data 
acquisition, and were subsequently searched for indications of transiting planets. Of those, 
112,321 were observed in all 12 quarters; the balance of 79,992 were observed only in a 
subset of quarters. Figure [1] shows the distribution of targets according to the number of 
quarters observed. Observation of a target in a subset of quarters can occur for any of three 
reasons. The most significant cause of limited observation is an onboard electronics failure 
which occurred on 2010 January 23, one month into quarter 4: this failure resulted in the 
subsequent loss of all data from 4 of the 84 CCD readouts on the focal plane (specifically, 
the 4 CCD readouts in Module 3). Due to the quarterly rotation of the spacecraft, this 
failure produced a "blind spot" in the Kepler field of view which moves relative to the target 
stars, causing a large number of targets to be visible only 75% of the time. Any target 
which falls onto Module 3 was only observed in 10 out of 12 quarters. The 28,965 stars 
which were observed for 10 quarters as shown in Figure [T] are mainly due to this effect. 
A second limitation on the number of quarters for which a target is observed is that the 
process of target selection and prioritization has evolved over the life of the Kepler Mission; 
targets which are added or removed subsequent to Quarter 1 will not be observed during all 
quarters. Additionally, a fraction of Kepler's observing capacity is reserved for use by the 
Kepler Guest Observer (GO) and Asteroseismic Science Consortium (KASC) programs; the 
targets observed in these programs are frequently updated, resulting in a number of targets 
observed for relatively short intervals. Finally, due to small asymmetries in the construction 
of the focal plane, a small number of targets cannot be observed in all spacecraft orientations: 
in some quarters these targets are imaged onto one or another CCD detector, while in some 
quarters the target images fall between the detectors. In total, 28,826 targets were observed 
in 8 or fewer quarters; 43,339 targets were observed for 9 or 10 quarters; and 7,819 targets 
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were observed for 11 quarters. 

In addition to the aforementioned 192,313 targets which were searched for planets, a 
total of 2,123 known eclipsing binaries which were observed but not searched for transiting 
planet signatures. This was done for operational reasons. The Kepler processing pipeline 
has limited capacity to identify circumbinary planets because their transit signatures are 
generally neither periodic nor of constant duration. However, the eclipses of an eclipsing 
binary system mimic planetary transits with sufficient fidelity to be identified by TPS as 
potential signals of transiting planets. These known eclipsing binaries were removed to 
reduce the computational and human burden which would otherwise have been imposed by 
their false-positive detections. 



1.2. Pre-Search Processing 



The processing of pixel data from the Kepler spacecraft, prio r to the search for transiting 
planet signatures, is summarized elsewhere ( IJenkins et al.ll2010af ). The processing step which 
has seen the most dramatic change is Pre-Search Data Conditioning (PDC). The purpose of 
PDC is to remove variations in the flux time series which are generated by changes in the 
spacecraft environment or other systematic effects. The original PDC algorithm determined 
the systematics by performing a robust least-squares fit of assorted spacecraft engineering 
variables to each flux time series, and then subtracting the systematics thus determined 
to yield a conditioned flux time series (ITwicken et al.ll2010bl ). While such an approach is 
guaranteed to reduce the bulk RMS variation of each target's flux, it can also distort the true 
stellar variations and can even add variability on timescales of interest for planet searches. 
Both of these unwanted side effects are driven by the same source: the least-squares fit is 
removing variability which is coincidentally correlated with some engineering variable, but 
not causally related. 

This unwanted behavior is corrected by applying a Bayesian approach to constrain the 
fitted amplitudes of systematic error terms which are then removed from the light curves. 
This process allows the algorithm to deduce "reasonable" values for the correlation of each 
identified systematic to the light curves, and thus to reject correlations which are wildly 
out of family. Additionally, the ensemble of target star data across a large number of stars 
is used to empirically identify systematic trends in the light curves, rather than relying 
upon the available spacecraft engineer ing data. The algorithm is fully described elsewhere 
JSmith et al.lboid ; Istumpe et al.l 120121 ). 



In addition to the corrections described above, the current PDC algorithm identifies and 



-5 - 



corrects the signature of a cosmic ray related artifact known as a Sudden Pixel Sensitivity 
Dropout (SPSD). An SPSD occurs when a cosmic ray produces a step reduction in the 
quantum efficiency of a pixel; the reduction is typically of order one percent, and the quantum 
efficiency partially recovers, typically over a period of hours to days. Because an SPSD bears 
a superficial resemblance to a transit signature (at least to a computer), efficient removal of 
SPSDs without inadvertent removal of actual transits is a crucial step in data conditioning 
for Kepler. Unlike environmental signatures, SPSDs are completely uncorrelated from one 
target star to another, and thus are removed from the data via a separate algorithm within 
PDC. 



2. Transiting Planet Search 



The T ransiting Plan e t Searc h (TPS) a lgorithm is described in some detail in iJenkins 



(120021 ) and lJenkins et al.l (l2010bt). as well aslTenenbaum et al.l ( 120121 ). The improvements in 
the algorithm since iTenenbaum et al.l ( 120121 ) are summarized below. 



2.1. Edge Detrending of Contiguous Blocks of Flight Data 

The algorithm which was previously used to remove trends at the ends of single-quarter 
data segments was replaced with an algorithm which performs a robust fit of the form: 

y = P x exp(-a;/P 2 ) + P 3 x + P 4 + P 5 exp[(x - 1)/P 6 ], (1) 

where y is the median-corrected flux, x is the sample time normalized to a range from to 
1, and Pi through P 6 are the parameters of the fit. In words, Equation [T] fits a line plus two 
exponential edge trends, one at the leading edge of the data region and one at the trailing 
region, with both the amplitude and the time constant of the exponentials as fit parameters. 
The form in Equation [T] was found to match the actual edge trends as well as the constrained 
polynomial fit which had previously been used. The advantages of the reformulated edge- 
trend removal are: a reduced number of assumptions and/or configuration parameters for 
the fit; use of the full data segment for the entire fit; robust fitting; and the fact that the new 
fit does not under any circumstances introduce a polynomial "wave" into the data segment 
in an attempt to correct the edges (i.e., over fitting). Additionally, whereas in the past the 
edge detrending was applied only to full quarters of data, in the current implementation 
it is applied at any time when there was an interruption of data acquisition to change the 
spacecraft orientation. This was done to mitigate the thermal transients which occur when 
the spacecraft attitude is changed. Attitude change incidents include all data downlink 
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intervals, plus any transitions into or out of safe mode. 



2.2. Detection and Vetoing of Potential Signals 



The first step in detection of potential signals is described in Section 2 of lTenenbaum et al 



( 120121 ): a wavelet-based, adaptive matched filter is utilized to search for periodic reductions 
in flux occurring against the non-white, non- stationary background of stellar variability. The 
significance of such a reduction is known as its Multiple Event Statistic. The Multiple Event 
Statistic is computed across a two-dimensional grid of signal period and epoch of first tran- 
sit, and across 14 trial transit pulse durations; the maximum Multiple Event Statistic from 
this set is captured, along with the combination of period, epoch, and transit pulse dura- 
tion (henceforth "signal timing") which generated it; target stars for which the maximum 
Multiple Event Statistic falls below the specified detection threshold of 7.1 o are rejected 
from further analysis. The requirement that the maximum Multiple Event Statistic exceed 
7.1 a removes from further consideration 76,668 targets, leaving 115,645 with at least one 
potential transit signal which lies above this threshold. 

The principal weakness of the Multiple Event Statistic calculation is that it cannot 
discriminate between a true train of transit events (which have uniform depth, duration, and 
shape to within the precision limits of the instrument) and a chance combination of dissimilar 
events which coincidentally occur within a flux time series. As an example, consider a flux 
time series for which the combined differential p hotometric precision (CD PP) for transit 



detection is 50 parts per million (PPM) at all times (IChristiansen et al.ll2012l ). If the flux time 
series contains 4 uniformly-spaced transits of with uniform depths 250 PPM, the resulting 
Multiple Event Statistic for that period and epoch will be 10 a, and will be reported as an 
above-threshold event by the Multiple Event Statistic calculation. On the other hand, if the 
4 transits are uniformly spaced but do not have uniform depth - for example, if the depths 
of the 4 transits are 20 PPM, 30 PPM, 50 PPM, and 900 PPM, respectively - the Multiple 
Event Statistic for this combination of events will also be 10 a, and will also be reported as an 
above-threshold event by the Multiple Event Statistic calculation. While the former scenario 
might be the signature of a transiting planet, the latter clearly is not. Thus, a Multiple Event 
Statistic which is above the detection threshold is a necessary but not sufficient condition 
for identifying a potential transiting planet signature. More generally, while the matched 
filter approach is optimal with respect to rejecting the null hypothesis, it is insufficient for 
discrimination between competing alternate models. For this reason, once a Multiple Event 
Statistic above the detection threshold is identified, event thus detected is subjected to a 
series of tests which are designed to discriminate between potential transit signatures and 



-7- 



heterogeneous combinations of unrelated events. These tests accept the former while vetoing 
the latter. 



2.2.1. Robust Statistic Veto of False Positive Detections 

In the first test used for vetoing false positives, a model light curve consisting of a train 
of model transit pulses is directly fitted to the flux time series. The model transit pulses 
are square waves, with their period, epoch, and duration determined by the signal timing 
produced in the Multiple Event Statistic test. In order to eliminate the effect of stellar vari- 
at ions, both the flux tim e series and the model transit pulse train are whitened, as described 



in 



Jenkins et al.l (l2010bl ). The fit is performed robustly, in order to reduce the influence of 
impulsive outliers in the flux time series; the Robust Statistic, which is the signal-to-noise 
ratio estimated from the fit, is then used to reject false positives. Specifically, a large value 
of the Robust Statistic indicates a detection in which the transits are reasonably uniform 
in depth and duration, which is characteristic of true transit signatures; a small value indi- 
cates that the Multiple Event Statistic has been formed from a combination of heterogeneous 
transit-like events with unequal depths, which is characteristic of false positives. A threshold 
of 6.4 a for the Robust Statistic rejects 79,030 targets, leaving 36,614 targets which require 
further scrutiny. 



2.2.2. x 2 Veto of False Positive Detections 

In the second test used for vetoing of false positives, we take advantage of the fact 
that, given the signal timing and the magnitude of the Multiple Event Statistic, the ex- 
pected frequency-domain content of the transits can also be calculated and compared to the 
frequency- domain content of the signal identified by the Multiple Event Statistic test. As 



shown briefly in Appendix [A], and more thoroughly in lSeader et al.l (120121 ). knowledge of the 
expected frequency-domain distribution of each transit allows construction of two functions, 
each of which is expected to be distributed according to a \ 2 distribution. These functions 
are then combined with the Multiple Event Statistic of the potential signal, as shown in 
Appendix |A] By requiring that the values of the two resulting discriminators, X^ and Xp), 
both exceed 7.0, we veto an additional 25,506 targets, yielding 11,108 targets which contain 
potential transiting planet signatures. An event which has passed all four tests - Multi- 
ple Event Statistic, Robust Statistic, and x 2 discriminators - is referred to as a Threshold 
Crossing Event (TCE). 
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2.3. Iterative Rejection of False Positives and Re-Searching of the Flux Time 

Series 

Prior versions of TPS suffered from a significant design weakness: in cases in which 
the strongest transit-like feature was vetoed, the search of that target would terminate. In 
this way a strong but low-quality transit-like signal could inadvertently mask a weaker but 
higher-quality event. This flaw is addressed in the current version of TPS: in the event that 
an apparent transit is vetoed, TPS goes on to search for additional transit signatures in 
the same light curve. Because the search of additional periods and epochs can potentially 
be extremely time-consuming, for operational purposes it is necessary to limit the number 
of iterations of searching which are permitted for a given target and a given trial transit 
pulse duration. At present the limit is set to 1000 iterations of re-searching. In the analysis 
reported here, approximately three quarters of all TCEs occurred on the first iteration of 
the search, with the balance TCEs detected on subsequent iterations. The largest number 
of iterations required to detect a TCE was 404. 



2.4. Removal of Non-Periodic Transit-Like Features 

The benefits of the multiple iterations of search, described above, can only be fully 
exploited in the absence of relatively strong non-astrophysical single events. Such strong 
events will cause the Multiple Event Statistic to exceed the 7.1 a threshold for large numbers 
of possible periods: folding a single strong event with a small number of weak events will 
produce a large Multiple Event Statistic, and there are an extremely large number of period- 
epoch combinations which will result in such a folding. If this happens, the 1000 iterations 
of searching can easily be exhausted in the process of eliminating a fraction of the spurious 
Multiple Event Statistics caused by a single strong event. Such an outcome can be avoided 
if these strong events are identified and removed prior to folding, but such removals are 
obviously dangerous: without prior knowledge, a feature in the data which is identified as 
a non-astrophysical event, and removed, could actually be a strong transit. For this reason, 
any event removal must be used sparingly. TPS addresses this issue in two ways. First, a 
minimum number of transits is required for an event to be accepted, since the probability of 
such chance combinations yielding a Multiple Event Statistic over threshold decreases as the 
number of events folded together increases. At present, the threshold number of transits is 3. 
Second, the current version of TPS is permitted to remove one, and only one, single event, 
and only in the case in which the first iteration of planet searching produces a strongest 
event which exceeds the Multiple Event Statistic threshold of 7. 1 a but which is then vetoed 
by RS, AT(x), or X@)- In such a case the strongest single event in the time series is removed, 
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if and only if the strongest single event has an amplitude which is greater than the Multiple 
Event Statistic threshold multiplied by the square root of the minimum number of transits 
(7.1 a x v3, or 12.3 a for the current parameter choices). Out of the 11,108 TCEs, 2,193 are 
found on light curves which have had such a feature removed. Additionally, on each target 
the number of such identifiable features is counted and recorded, regardless of whether any 
such events are removed. Out of all 192,313 targets, the number which have at least one 
identifiable strong single event is 46,481. Figure |2] shows the distribution of the number of 
strong events for targets which have at least one such event. Note that the distribution is 
strongly peaked towards small numbers of events, implying that it is worth considering the 
option of using more aggressive removal of features in future TPS runs. 



2.5. Limitation on Allowable Transit Duty Cycles 

During development of the most recent version of TPS, it was observed that a substantial 
number of false positives were produced with a short period and a long trial transit pulse 
duration. In the processing run reported on here, we limit the ratio of the trial transit pulse 
duration to the detected period of the signal to be less than 0.16 in order to mitigate this 
category of false positive detections. By comparison, the transit duration to transit period 
ratio for a central transit of the Earth is approximately 7.4 x 10~ 4 . 



2.6. Detection of Multiple Planet Systems 



In lWu et al.l ( 120101 ). the process for detection of multiple planet systems is described. In 
brief, for each target star which yields a valid detection as described above, a planet model is 
fit to the flux time series, using the period and epoch of the TCE as a starting point for the 
fit; the transit signatures from the fitted planet model are removed from the flux time series; 
and the residual flux time series is then searched for additional TCEs. The subsequent TCE 
search is performed using the same TPS algorithm as is used for the initial search. When 
multiple planet detections are included, the total number of TCEs increases to 18,427. 

Following the detection and model fitting described above, an additional set of auto- 
mated analyses are performed which allow astrophysical false positives, such as background 
eclipsing binaries, to be ruled out. For the purposes of the discussion below, we will consider 
only the TCEs for which the additional automated analyses were successfully completed: 
this set includes 18,406 TCEs falling on 11,087 targets. Of the 21 excluded targets, 19 non- 
stellar "super-aperture" targets, for which the automated post-detection analyses cannot be 
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performed, while 2 are conventional Kepler targets for which the automated post-detection 
analyses failed due to software errors. Each of the excluded targets produced a single TCE. 



3. Detected Signals of Potential Transiting Planets 

Figure [3] shows the epoch and period of the 18,406 detections, with period in days 
and epoch in Kepler-Modified Julian Date (KJD), which is Julian Date - 2,454,833.0. While 
Figure [3] is relatively free of obvious artifacts, there is an evident overabundance of detections 
at periods of approximately one year. Figure H] shows the distribution of periods from Figure 
[3j the overabundance is even clearer here, with 2,042 TCEs with periods between 300 and 
400 days as compared to 305 TCEs with periods of 200 to 300 days and 168 TCEs with 
periods of 400 to 500 days. 

Figure [5] shows the participation of the various detector channels on the Kepler focal 
plane in TCEs with periods between 300 and 400 days: each sub-image shows one quarter, 
and the relative intensity of each channel represents the participation, of that channel in that 
quarter, in the 2,042 TCEs. A small number of channels are disproportionately involved in 
these TCEs, mainly cha nnels which are known to suffer from exc ess noise due to issues in 



the readout electronics (IGilliland et al.ll201ll : ICaldwell et al.ll2012l ). As Kepler rotates each 



quarter, certain stars will typically be imaged onto one of these misbehaving channels once 
per year; this will result in detections on those stars with periods of approximately 1 year. 
Efforts to manage the excess noise of these channels in Kepler data processing are ongoing. 

Figure [6] shows the distribution of detections in the plane of orbital period and Mul- 
tiple Event Statistic. Note that the overabundance of detections at one year is completely 
dominated by relatively weak signals. Figure [7] shows the distribution of Multiple Event 
Statistics: on the left is the distribution of 17,547 detections with Multiple Event Statistic 
less than or equal to 100 a\ the right panel shows the same but for the 15,007 detections 
with Multiple Event Statistic less than or equal to 20 a. Figure M shows the distribution 
of detected periods: on the left is the 5,043 detections with periods over 15 days, on the 
ri ght is the 13,363 detecti ons with periods less than 15 days. As compared to Figure 6 



in 



Tenenbaum et al.l (120121 ). the right side of Figure [8] is far more strongly peaked towards 
short periods. Note that, in addition to the excess of detections with periods close to 1 year, 
there is a smaller excess of detections with periods of 0.5 years. This peak is caused by the 
presence of two high-noise channels which are located symmetrically opposite one another 
on the focal plane, specifically Module 17, Output 2, and Module 9, Output 2: stars which 
are imaged onto one of these channels will be imaged onto the other 6 months later. 
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Figure [9] shows the distribution in estimated transit depths. These depths are esti- 
mated from the event st atistics and the noise properties of each light curve, as described in 



Tenenbaum et al.l (|2012| ). The top plot shows the 16,095 signals which have estimated depths 
of 1,000 parts per million (PPM) or less; the bottom plot shows the the 8,060 cases with 
estimated depths of 100 PPM or less. These sub-distributions contain 87.4% and 43.8%, re- 
sp ectively, of all the detect ions in this dataset. Comparing to the same transit depth ranges 



in 



Tenenbaum et al.l ( 120121 ). we find that in the processing of the first 3 quarters of data the 
totals were 72.3% and 13.2%, respectively. This increased sensitivity to weaker transits is 
driven in the main by the vastly increased amount of data collected since the end of Quarter 
3. 

Figure [TO] shows the distribution of transit duty cycles for all detections, where the 
transit duty cycle is defined to be the ratio of the trial transit pulse duration to the detected 
period of the transit (effectively, the fraction of the time during which the TCE is in transit). 
The top plot shows all 18,406 TCEs, while the bottom plot shows the 7,721 TCEs with transit 
duty cycle below 0.04. Figure [TT1 shows the relationship between period and transit duty cycle 
for all 18,406 detections. As expected, the relationship is quantized due to the quantization 
of trial transit pulse durations utilized in the TPS detection algorithm, and as a consequence 
of this quantization the period and transit duty cycle are inversely proportional for a given 
trial transit pulse duration. Figure [11] also demonstrates why there is an abundance of 
events with transit duty cycles of approximately 0.002 shown in Figure [10J this is actually a 
reflection of the abundance of events with periods near 1 year, for which the possible transit 
duty cycles are all in the realm of 1 x 10~ 4 to 0.002. 

Detailed information on all TCEs which contributed to this analysis can be found at 



the NASA Exoplanet Archive: http://exoplanetarchive.ipac.caltech.edu/ 



3.1. Comparison with Known Kepler Objects of Interest (KOIs) 

In order to gauge the performance of TPS as a detector of periodic transit-like phenom- 
ena, it is necessary to compare the set of TCEs to a set of known events which can function 
as a "ground truth". For this purpose, we use the list of Kepler Objects of Interest (KOIs). 
Out of the current set of KOIs (Burke, C.J. et al.2012, in preparation), we have selected 
2,630 KOIs which are judged reasonable for comparison to the TCE list: these are KOIs for 
which the signal to noise ratio is high enough to permit detection in TPS, the number of 
transits which fall within the 12 quarters of Kepler data is 3 or more, and which do not fall on 
targets which were excluded from TPS processing. The selected set of KOIs includes planet 
candidates, known astrophysical false positives (mainly eclipsing binaries and background 
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eclipsing binaries), and objects which have not yet been characterized as planetary or non- 
planetary; for the purpose of the comparison, it is sufficient that each KOI be reasonably 
expected to produce a TCE. 

The comparison of the KOI and TCE lists is complicated by the fact that any target 
star can have multiple KOIs and/or multiple TCEs, and the multiplicities of the two are 
obviously not guaranteed to agree. As a first step, we compared the number of TCEs on 
each KOI target star with the number of KOIs on those stars. The result of this comparison 
is as follows: 

• A total of 31 KOIs do not have a corresponding TCE 

• The remaining 2,599 KOIs were matched one-for-one by TCEs which occurred on the 
same target stars 

• 337 KOI target stars produced more TCEs than their known KOIs, resulting in a total 
of 438 TCEs which fall on KOI targets but are not matched by known KOIs. 

3.1.1. Failure to Detect Short-period KOIs due to Data Artifacts 

Subsequent analysis of the KOIs which were not matched by TCEs showed that 21 
out of the 31 had relatively short periods, typically under 2 weeks. Figure d2] shows the 
maximum Multiple Event Statistic as a function of period for a selected target in this group. 
The period and Multiple Event Statistic of the KOI on this target star is indicated with 
a marker in the plot. As shown in Figure [121 the Multiple Event Statistic is dramatically 
and systematically larger for long periods than for short periods, with a gross pattern of the 
Multiple Event Statistic rising as the square root of the period. 

The root cause of this pattern is a small number of strong transit-like data anomalies 
which are randomly distributed amongst the flux time series. During the folding process 
which results in Figure H21 the anomalies are combined with background noise to produce 
strong Multiple Event Statistics. For short periods, the number of events folded together is 
large, thus there are many background noise events combined with a single data anomaly; 
as a result, the Multiple Event statistic is relatively small due to the dilution from the many 
background noise events. For long periods, because the number of noise events is small, the 
data anomaly is relatively undiluted and the resulting Multiple Event Statistic is relatively 
large. 

A Multiple Event Statistic which is composed of one strong transit-like anomaly and 
multiple non-transit-like background noise signals will not survive the Robust Statistic and 
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chi-square vetoes, as it does not match the quantitative signatures of a true transit pulse 
train which those vetoes require. Unfortunately, as noted above, the ability of TPS to 
reject large numbers of such false detections in a single light curve has been limited for 
reasons of computational performance: the 1,000 combinations of period and transit epoch 
which produce the strongest Multiple Event Statistics are searched, after which the search 
algorithm declares that no transit signatures were found. In the case of a target such as the 
one selected for Figure [T2l the 1,000 strongest signals are all at the long-period end of the 
distribution, and the search iterations are exhausted before the actual signal at 3.766 days 
is examined. 

In the limit where strong transit-like data anomalies are distributed uniformly and 
randomly throughout the dataset, there will inevitably be some targets for which early 
quarters of data contain no anomalies but later quarters contain one or more. If such a target 
also contains a short-period, low-intensity transit signature, then the transit signature will be 
detectable only so long as the data used for the detection was entirely acquired prior to the 
first anomaly occurrence. This appears to be the case for the 21 instances of short-period, 
low-intensity KOIs which were not detected by the most recent TPS run. Note that this is 
one of those unusual situations in which a 12 quarter dataset does not permit detection of a 
signal which was apparent in a 3- or 6-quarter dataset. 



3.1.2. Matching of KOI and TCE Ephemerides 

Detection of a TCE on a KOI target is a necessary but not sufficient condition to 
determine that the TCE is a detection of the KOI. An additional requirement is that the 
TCE and KOI are referring to the same transit signature. This is typically best determined 
by matching the ephemerides of the two signatures. For this purpose we use an ephemeris- 
matching calculation described in Appendix [B] The resulting match parameter varies from a 
value of zero, indicating no match whatsoever, to a value of one, indicating a perfect match 
within the limits of the Kepler data and data processing algorithm. In the case of a target 
star which has multiple KOIs and/or multiple TCEs, it is necessary to attempt to correctly 
match each KOI with the corresponding TCE. A subtlety in this process is that it is at 
least conceivable that multiple KOIs will be best matched by the same TCE. For example, 
consider a target which has two KOIs, with periods of 0.5 and 1.0 years, and three TCEs, 
with periods of 0.5, 0.1, and 0.03144 years. Depending on the detailed transit timings, it is 
at least conceivable that the TCE with the 0.5 year period will be the best match out of the 
3 TCEs for both the 0.5 year and 1.0 year period KOIs. In order to ensure that each TCE 
is paired with one and only one KOI, the following approach is used: 
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• Compute the ephemeris matches for all tikoi x ^tce possible matches between KOI 
and TCE 

• Find the best match in that matrix, and pair the corresponding KOI and TCE with 
one another 

• Eliminate both the KOI and the TCE which have now been paired 

• Repeat the exercise with the remaining (kkoi ~ 1) x (^tce — 1) possible matches, and 
iterate until either the number of TCEs or the number of KOIs on the given target 
star are exhausted. 

Figure [131 shows the value of the ephemeris match between each of the 2,599 KOIs and 
the TCE on that star which provided the closest match. The values in Figure [13] are sorted 
into descending order. Of the 2,599 match values, only 104 are less than 1.0, with 2,495 
identically equal to 1. Of these 104 cases, 91 are either harmonic mismatches between the 
TCE and the KOI (especially in cases where the KOI period is under the 0.5 day minimum 
period used in TPS) or cases in which the KOI timing was determined using only data from 
early quarters, resulting in errors when extrapolating the timing to the full 12 quarters used 
in this analysis. The remaining classes of discrepancy between TCE and KOI are as follows: 

• In 8 cases, transit timing variations (TTV) cause confusion for TPS, which is explicitly 
designed to find periodic transit signatures; this generally results in a tremendous 
period mismatch between the KOI timing and the TCE, since TPS will usually detect 
a tiny subset of all transits. 

• In 3 cases, the KOI and the TCE have inconsistent transit timing signatures, but both 
signatures appear valid. In each of these cases it is assumed that TPS has identified 
a heretofore-unknown transit signature on the KOI target, but then failed to detect 
the known KOI during the multiple-planet search which followed detection of the new 
TCE. For this reason, these cases are classified as failures of the TPS algorithm to 
recover the known KOIs. 

• In 2 cases the KOI timing clearly produces a transit signature and the TCE timing 
clearly does not. 

3.1.3. Conclusion of TCE-KOI Comparison 

Out of 2,630 KOIs which could be expected to produce TCEs, 44 did not produce 
TCEs. This includes 31 cases in which there was no TCE and 13 cases in which a TCE was 
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produced but the timing of the TCE did not match the timing of the KOI, even when "near 
misses" such as harmonic or sub-harmonic detection are taken into account. This yields a 
KOI recovery rate of 2,586 out of 2,630, or 98.3%. 

3.1.4. Transit Duty Cycle of TCEs Matched to KOIs 

Figure [TH shows the distribution of TCE transit duty cycles for the 2,495 cases in which 
the TCE-KOI ephemeris match is identically equal to 1, as well as the distribution for the 
2,205 cases in which the ephemeris match is identically equal to 1 and the transit duty 
cycle is below 0.04. When compared to Figure [TD1 which shows the transit duty cycle for 
all TCEs, two differences are instantly apparent. First, and least surprisingly, the spike in 
transit duty cycle values around 0.002 which is visible in Figure [10] is absent from Figure 
IT4"1 This is because the spike in the former is due to the spurious, anomaly-driven detections 
at 1 year period which are caused by CCD readouts with unusually strong noise properties; 
these spurious detections are not present in the set of KOIs, thanks to the greater degree 
of scrutiny on KOIs which allows elimination of such false detections. Second, the KOI 
transit duty cycle distribution shows a monotonic reduction in the number of KOIs as the 
transit duty cycle is increased; the TCE distribution shows a reduction from 0.01 to 0.04 
transit duty cycle, and an increase from 0.04 to 0.16. Quantitatively, while 58% of all TCE 
detections in Figure [TD] have a transit duty cycle of 0.04 or greater, only 12% of all KOIs in 
Figure [T4l have transit duty cycle above 0.04. The implication is that the long transit duty 
cycle TCEs are most likely dominated by false positive detections, and that further reduction 
in the maximum allowed transit duty cycle from the current value of 0.16 would result in 
further reduction of the fraction of false positive TCEs, though of course some study would 
be needed to determine an optimum threshold for the transit duty cycle. 

4. Conclusions 

The Kepler Transiting Planet Search (TPS) algorithm has been run on 192,313 tar- 
gets in the Kepler field of view, including 112,321 targets which have been observed near- 
continuously for the first 12 quarters of the mission. Potential signals of transiting planets 
were detected on 11,087 of these targets. When subjected to further searches for multiple 
planets, the total number of detected signals grew to 18,406. Comparison with a known 
and vetted set of transit-like astrophysical signatures, the Kepler Objects of Interest (KOIs), 
demonstrates that within the parameter regime of the search algorithm and the KOIs the 
recovery rate of known events is 98.3%. 
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A. Threshold Crossing Event Vetoes using Chi-Square Discriminators 

The basic idea behind the construction of the test statistic is to break up the matched 
filter output into several contributions and c ompare each c ontribution with what is expected. 



What follows in this paragraph is taken from Allen I (120041 ) for completeness. Mathematically 
we have, 

b 

z = J2 z ji ( Al ) 

where the Zj are additive chunks of the filter output that when added together reproduce 
exactly the output value of the filter, here denoted z. Next consider the b quantities defined 
by 

Azj = zj -qjz , (A2) 

where 

b 

£> = 1. (A3) 

3=1 

and the qj are the expected fractional contribution to z from the j'th contribution. The Azj 
are then the set of difference between the j actual contributions and expected contributions. 
By definition, Azj's sum to zero 

b 

E = ( A4 ) 



and their expectation values vanish 



The x 2 statistic is then defined as 



x 2 



(Azj) = . (A5) 



3=1 



Note that with some basic assumptions on the detector noise (namely, that the noise after 
whitening is zero mean, unit variance, and uncorrelated) the expectation values of these 
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quantities are independent of whether or not a signal is present in the data, making this an 
ideal discriminator for noise events. 

This concept can now be applied in various ways to the components of the single event 
statistics and multiple event statistics. Beginning with the single event statistic time series 
z(n): 

N(n) 



z(n) 



where 



N(ra) 



n 






Xi 




j 











M 



£Ni(n) 



(A7) 



(A8) 



2 * *1] ( 



M 



in 



where M, x«, <7j, and Sj are defined in Appendix A of lTenenbaum et all (120121 ) . Qualitatively, 
the time series N(n) represents the amplitude of a transit-like signal centered at sample n, 
D(n) represents the square of the noise limit for detecting a transit-like signature at sample 
n; z(n) therefore represents the significance of a transit-like signature detected at sample 
n. Equation IA8I also defines quantities and D$: these are the contributions to N and D, 
respectively, from frequency band %. Choosing a particular point in transit duration, period, 
and epoch space, {D,T,t }, selects out a set of data samples {A}, one for each transit, that 
start with the sample corresponding to the epoch i and are spaced T samples apart. These 
cadences form a subset of {n}, A C {1,2, ...,P}, where P is the number of transits in the 
dataset. The Multiple Event Statistic is then constructed as: 



(A9) 



ieA 



One version of the x 2 can be constructed by focusing on the wavelet contributions to 
the Single Event Statistics. If we start now with (TA7j) . we can make the identifications: 

Ni(n) 



Zi(n) 



in 



in) 



(A10) 



(All) 



where now the Zi(n) are the actual contributions the the SES time series from the z'th wavelet 
component and qi(n) are the corresponding expected contributions. Now the \ 2 statistic can 
be formed: 

Azi(n) = Ziin) - qi(n)z(n) (A12) 



- 18 - 



Using the previously mentioned noise assumptions, this statistic should be x 2 distributed 
with M — 1 degrees of freedom; due to leakage between the wavelet components it turns out 
to be gamma distributed in actual practice. We have a value for this statistic at each n, 
so we can form a coherent statistic by adding up the points that contribute to the Multiple 
Event Statistic at times j where j G A. This will give us, Xm, 



j<=A 



[A*(7)] s 



= w 

A/ Az 2 - 
jeA i=i y ^ 

where the Azjj and have been introduced for notational convenience. Using the previous 
assumptions on noise and assuming a perfect match between the signal and template, this 
statistic is x 2 -distributed with P(M — 1) degrees of freedom. 

Another version of the \ 2 statistics can be constructed by examining the P temporal 
contributions to the Multiple Event Statistic. To begin, Equation IA9I can be rewritten using 
the quantities defined for notational convenience: 



£ i6 A£ElN<0') 



£ i6 A$XiB*(j) 
£jGA £i=i n 



£jeA £i=l ^ij 

Now, choosing to examine the contributions to the Multiple Event Statistic from each j G A, 

z = 2.^1^ = (A16) 



fceA 2^i=l 

M 
i=\ 

£fceA £i=i ^ 



T 1 

Qj = - ' > ( A17 ) 
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where now Zj are the actual temporal contributions to the Multiple Event Statisic and the 
Qj are the expected contributions. Now, X( 2 ) can be constructed: 

AZj = Zj - QjZ (A18) 
A 7 2 

xh = Z^ ■ <A»> 
jeA Vj 

Under the previous noise assumptions, this statistic is x 2 -distributed with P — 1 degrees 
of freedom. Since we have summed over the wavelet contributions prior to computing this 
statistic it avoids the leakage issue and turns out to be a much more powerful discriminator. 
Note that dozens of other version of the chi-square veto have been formulated and investi- 
gated with real data, and indeed an infinity of such statistics exists. These two versions give 
us the greatest detection efficiency while simultaneously minimizing the false alarm rate. 

The results quoted in what follows are subject to a subtle issue discovered after the 
Q1-Q12 run was completed. The whitening coefficients in the calculation should be robust 
against the presence of a signal in the data since they are computed using a moving circu- 
lar median absolute deviation. However, the x 2 statistics are very sensitive to any signal 
dependence of the whitening coefficients, however small it may be, due to the way in which 
they are constructed. The code is now being re-written so that in-transit cadences are first 
gapped and filled to re-compute the whitening coefficients for use in the x 2 calculation. This 
should explicitly remove the signal dependence and give us more vetoing power. 

Based on analysis of known true-positive and expected false-positive targets, TPS uses 
the following discriminators in vetoing false-positive detections: 

_ yp(M-i) 

A ( i) = == , (A20) 



A ( 2 ) = r=^- 

4) 

In words, the Multiple Event Statistic for a possible detection is divided by the square-root 
of the reduced chi-square for each of the chi-square statistics computed above, resulting in 
two discriminators. 



B. Ephemeris-Matching Calculation Used in KOI-TCE Comparisons 



Consider a TCE which is characterized by its period Ttce, epoch £tce, and trial transit 
pulse duration D; on the same target star, consider a KOI which is characterized by its 
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period Tkoi and epoch t^oi- The following calculation can be used to determine whether 
the two ephemerides represent a good match or a poor match in transit timing. 

First, of the two periods, define T short to be the shorter, and t sho rt to be the corresponding 
epoch (i.e., if the KOI has a shorter period, then T short = T K oi and t s hort = ^koi); define T long 
and ti on g to be the period and epoch of the ephemeris with the longer period. The ephemeris 
matching parameter is the fraction of transits predicted by (T sn0 rt, ^short) which fall within 
D/2 of one of the transits predicted by (7i ong , t\ ong ). 

The reason for using the fraction of short-period transits which are predicted is that 
there will always be more short-period transits than long-period ones. In the case of an 
extremely large mismatch in periods between the two ephemerii (for example, a 3 day and a 
300 day period), it is possible for all of the longer-period transits to fall close to transits of 
the shorter period, but the reverse is not true. Thus, in cases of extreme mismatch in period, 
using the fraction of short-period transits as the metric ensures that matching parameter 
has a low value, whereas the fraction of long-period transits which fall near a short-period 
transit can be large, and thus use of the long-period transits in this way could result in a 
large value of the matching parameter even though the ephemerii are wildly mismatched. 

The duration of the trial transit pulse must be included because the finite pulse width 
and the finite duration of a real transit result in a family of nearly-degenerate (period,epoch) 
combinations. For example, a dataset which contains 3 transits of 13 hour duration at 365 
day period would be well-matched by a model transit with 365 day period, but almost equally 
well by a transit with 364.9 day period or 365.1 day period. The matching parameter takes 
this degeneracy into account by requiring that the short-period transits be within one-half 
of a trial transit duration of the long-period transits. The duration "smearing" is applied 
to the longer-period ephemeris because, in a case with a huge period mismatch, applying 
it to the short-period ephemeris could result in duty-cycle problems. For example, consider 
the match between a 365 day period ephemeris with 13 hour duration and a 1 day period 
ephemeris. Applying the pulse duration smearing to the short-period ephemeris would result 
in a duty cycle greater than 0.5; applying the smearing to the long-period ephemeris ensures 
that such absurd combinations of parameters do not occur. 
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Fig. 1. — Histogram of number of quarters of observation for all targets. The significant 
number of targets observed for 10 quarters out of 12 is primarily due to an onboard electronics 
failure which prevents readout from 4 out of the 84 CCD modules on the focal plane, resulting 
in a "blind spot" which rotates through the field of view as Kepler rotates about its axis. 
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Fig. 2. — Distribution of the number of strong features in each flux time series, as defined in 
the text. The final bin includes overflows: there are a total of 327 targets with 10 features 
and 4,650 with more than 10 features. 
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Fig. 3.— Epoch and period of the 18,427 TCEs detected in the 12-quarter TPS run. Periods 
are in days, epochs are in Kepler-modified Julian Date (KJD), see text for definition. 
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Fig. 4. — Distribution of TCE periods. The excess of detections at periods close to 1 year 
is due to the rotation of a small number of image artifact channels about the focal plane as 
Kepler rotates about its axis. 
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Fig. 5. — Participation of Kepler output channels in TCEs with periods between 300 and 
400 days. The sub-plots are all oriented such that modules 2, 3, and 4 are at the top. 
The strongest contributions come from Module 17, Output 2, which is known to exhibit 
temperature-dependent noise artifacts. Other strong contributors shown are Module 9, out- 
put 2; Module 13, Output 4; and Module 18, Output 2. All of these channels are also known 
to exhibit unusually elevated noise, though not at the level of Module 17, Output 2. 
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Fig. 6- 



Distribution of TCE periods and Multiple Event Statistics. 
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Fig. 7.— Distribution of Multiple Event Statistics. Left: 17,568 TCEs with Multiple Event 
Statistic of 100 or lower. Right: 15,018 TCEs with Multiple Event Statistic of 20 or lower. 



-29 - 




Fig. 8. — Distribution of periods. Left: 5,045 TCEs with periods greater than 15 days, with 
the data anomaly-driven excess at approximately 1 year clearly visible. Right: 13,382 TCEs 
with periods less than 15 days. 
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Fig. 9. — Distribution of estimated transit depths. Top: 16,115 signals with estimated depth 
of 1,000 parts per million (PPM) or less; bottom: 8,068 signals with 100 PPM or less. 
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Fig. 10.— Distribution of transit duty cycles. Top: all TCEs. Bottom: 7,729 TCEs with 
transit duty cycle below 0.04. 
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Fig. 11. — Relationship between period and transit duty cycle for all TCEs. The structure 
observed is driven by the fact that TPS uses a small number of fixed trial transit pulse 
durations for its searches, and by the fact that at a given trial transit pulse duration the 
transit duty cycle is inversely proportional to the TCE period. 



-33 - 




Fig. 12. — Maximum Multiple Event Statistic as a function of period for a sample target. 
In this target, the KOI period of 3.766 days is shown at the marker, with a Multiple Event 
Statistic of 11.66 a. One or more artifacts in the flux time series are causing the large number 
of larger Multiple Event Statistic values at longer periods. Because of the 1,000 iteration 
limit on rejecting strong signals and re-searching for better but weaker signals, this KOI is 
not detected: the 1,000 iterations are exhausted before all of the false alarms in the figure 
can be rejected. 



-34 - 



0.8 



0.6 



■g 0.4 

a) 

E 

0) 



0.2 



500 



1000 



1500 
Target 



2000 



2500 



3000 



Fig. 13. — Value of the ephemer is- match parameter described in the text across all 2,608 
TCEs which are matched to known KOIs. Only 113 of the values are not identically equal 
to 1. 
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Fig. 14. — Distribution of transit duty cycles for TCEs successfully matched with KOIs. 
Top: 2,495 cases in which the ephemeris match is identically equal to I. Bottom: 2,205 
cases in which the ephemeris match is identically equal to 1 and the transit duty cycle is less 
than 0.04. 
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